Untitled Document

How strong was my statistical test:
Science never proves anything. It must deal in probabilities. This goes back to the nigh forgotten days of the medieval scholastic debates. They have been parodied as discussing, “How many angels can dance in the head of a pin?” The point, I am given to understand, is the question of just how much physical reality is there to an angel. Opinions vary on that one.

For our purposes the critical argument was made by “Doctor Invincibilis,” William of Ockham. His principle, with which he basically won the scholastic debates, was “entities must no be multiplied beyond necessity” or more simply, if there are multiple possibilities the simplest one is preferred. It is a good notion. It is practical. It is a foundation stone to modern science. One always prefers the simplest explanation.

Does that mean the simplest explanation is true? Of course it does not. It merely means that it is preferred until more evidence makes another explanation necessary. Newton’s gravity accounted for all available information at the time. By the time of Einstein more information was available and a more complex physics was needed.

In fact almost anything that can be explained can be explained in an unlimited number or ways. Suppose you are presented with a chess playing machine. It is probably a computer. If it is big enough there could be a person inside making the decisions. Or there could be a magic spell. Or a trained rat. Or three trained rats, a mechanical device and a turnip. Ultimately you cannot know all the possible explanations. Since there is a limit to what you can know, the chance that you can even imagine the truth is vanishingly small.

So it is not possible to prove anything in science. If anyone shows you data and says that is proof, it is not. If anyone shows you data and says this it establishes that something is probably true, it does not.

What you can, however, do is to attempt to disprove things. It is seldom done in a professional article, but what is proper form is to set up a “null hypothesis.” I have a theory. The theory makes a prediction. There is an implied counter theory, which is that the theory I am pushing is wrong. This is the null hypothesis. This also makes a prediction. If there is no difference between the predictions of my theory and the predictions of the null hypothesis then I do not have a theory.
Once you have established what the theory is and what the null hypothesis is, then you can look at the data. If the data seem to support the theory, then you must consider that it could be a fluke. I once had a friend who showed me a penny with tails on both sides. Two coins had been ground down and then glued together to make what looked much like an honest coin, but it always came up tails. So we toss a coin. It comes up tails. It could be an honest coin or a trick coin. We have proved nothing. So we toss it again; tails again. The null hypothesis – honest coin – could still be right. If it came down tails the first time there is still an even chance that it comes down tails again. So we toss it 10 times. Tails 10 times. The chance of this with an honest coin is less than 1 in 1,000. We write that p < 001. That interprets that the probability of this happening by chance is less than 1 in 1,000. That is a strong test. Toss it 20 times getting heads every time and we have p < .000001 or less than a one in a million chance that it is a fluke. That is quite good. It is not, however, absolute proof. No statistical test can ever be absolute.

It used to be that any test that was as good as p < .01 was good enough to publish and good enough to act on. Nowadays p< .001 is considered very respectable, and if it goes to p < .0001 the author has bragging rights. When I was in medical school they showed us a study in which it had been “proven” that licorice was a good treatment for peptic ulcers. In those days we still gave sips of cream for ulcers. That treatment went back to Roman times. It has never been proven to be effective, although some say it relieved symptoms sometimes. (We have now identified the bacteria that causes ulcers so we are much better off.) When I looked at the study it was immediately obvious that they had screened a lot of things. Licorice was one of more than a hundred treatments that had been tried. But it was only proven at the p < .01 level. So I just did not believe it. Of course if you try 100 things there is a good chance that one of them seems to work at that confidence level. Modern statistics has caught up with the situation and that particular mistake is, I think, no longer made.

In my last posting I showed a graph that represented my theory and one that represented real data. What is the chance that it was just a fluke? Both the null hypothesis and the theory show a rapid rise in fertility with rise in population size for small populations. So there is nothing to test at that point. But with modestly larger populations there is a difference. While the null hypothesis shows no further change or conceivably a slight rise in fertility with larger populations, the theory predicts that fertility:

goes back down
almost as quickly
levels off
below replacement.

4 predictions are made. All were born out in practice. Since each prediction could be true of false, we will take true to be an even chance. So there is a one in two chance of the first being true, a one in four chance of the first and second being true, a one in eight chance of the first three being true and a one in sixteen chance of all four being true. That does not sound very good. About the best you can say is p > .1. I would hardly be troubling you about something that had a one in ten chance of just being a fluke. I’ll bet you can make up ten theories in as many minutes.

But if there were two field studies and both agreed with prediction, then the chance of it being a fluke falls to p < .01. This should be enough to make a doubter squirm. But in fact the authors put together more than 1,000 field studies; they virtually exhausted the field of natural history pulling out data. They found this everywhere. I am sure some results looked better than others, but the p < .1 for one study is already pretty conservative. Take a thousand studies together and you get

p < .0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001.

I am very confident that we must reject the null hypothesis. And that is before we even confront the massive human data.

There have been 4,089 visitors so far.

Home page.